New improvements in decoding speed and latency for automatic captioning

نویسندگان

  • Jian Xue
  • Rusheng Hu
  • Yunxin Zhao
چکیده

In this paper, we present new improvements in decoding speed and latency for automatic captioning in telehealth. Complementary local word confidence scores are used to prune uncompetitive search paths. Subspace distribution clustering hidden Markov modeling (SDCHMM) is used for fast generation of acoustic and local confidence scores, where overlap accumulative probability (OAP) is used to measure the similarity of Gaussian pdf’s in SDCHMM. We propose to use pre-backtrace based on detection of prosodic boundaries defined by unfilled pauses, filled pauses, as well as pitch contour to decrease latency. Experiments were conducted on a telehealth captioning task with vocabulary sizes of 21 K and 46 K. The proposed methods led to 33% improvement in decoding speed without loss of word accuracy, and to 3 folds of decrease in maximum latency with about 1.6% loss of word accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards automatic closed captioning : low latency real time broadcast news transcription

In this paper, we present a low latency real-time Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy. We describe our recent modeling and efficiency improvements that yield a 22% word error rate on the Hub4e98 test set while running faster than real-time. These include the discriminative training of a feature transform and the acoustic m...

متن کامل

Reinforced Video Captioning with Entailment Rewards

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic m...

متن کامل

Multi-Task Video Captioning with Video and Entailment Generation

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generati...

متن کامل

A New Design for Two-input XOR Gate in Quantum-dot Cellular Automata

Quantum-dot Cellular Automata (QCA) technology is attractive due to its low power consumption, fast speed and small dimension, therefore, it is a promising alternative to CMOS technology. In QCA, configuration of charges plays the role which is played by current in CMOS. This replacement provides the significant advantages. Additionally, exclusive-or (XOR) gate is a useful building block in man...

متن کامل

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-theart neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006